5 research outputs found
Quantifying OpenMP: Statistical Insights into Usage and Adoption
In high-performance computing (HPC), the demand for efficient parallel
programming models has grown dramatically since the end of Dennard Scaling and
the subsequent move to multi-core CPUs. OpenMP stands out as a popular choice
due to its simplicity and portability, offering a directive-driven approach for
shared-memory parallel programming. Despite its wide adoption, however, there
is a lack of comprehensive data on the actual usage of OpenMP constructs,
hindering unbiased insights into its popularity and evolution. This paper
presents a statistical analysis of OpenMP usage and adoption trends based on a
novel and extensive database, HPCORPUS, compiled from GitHub repositories
containing C, C++, and Fortran code. The results reveal that OpenMP is the
dominant parallel programming model, accounting for 45% of all analyzed
parallel APIs. Furthermore, it has demonstrated steady and continuous growth in
popularity over the past decade. Analyzing specific OpenMP constructs, the
study provides in-depth insights into their usage patterns and preferences
across the three languages. Notably, we found that while OpenMP has a strong
"common core" of constructs in common usage (while the rest of the API is less
used), there are new adoption trends as well, such as simd and target
directives for accelerated computing and task for irregular parallelism.
Overall, this study sheds light on OpenMP's significance in HPC applications
and provides valuable data for researchers and practitioners. It showcases
OpenMP's versatility, evolving adoption, and relevance in contemporary parallel
programming, underlining its continued role in HPC applications and beyond.
These statistical insights are essential for making informed decisions about
parallelization strategies and provide a foundation for further advancements in
parallel programming models and techniques
Scope is all you need: Transforming LLMs for HPC Code
With easier access to powerful compute resources, there is a growing trend in
the field of AI for software development to develop larger and larger language
models (LLMs) to address a variety of programming tasks. Even LLMs applied to
tasks from the high-performance computing (HPC) domain are huge in size (e.g.,
billions of parameters) and demand expensive compute resources for training. We
found this design choice confusing - why do we need large LLMs trained on
natural languages and programming languages unrelated to HPC for HPC-specific
tasks? In this line of work, we aim to question design choices made by existing
LLMs by developing smaller LLMs for specific domains - we call them
domain-specific LLMs. Specifically, we start off with HPC as a domain and
propose a novel tokenizer named Tokompiler, designed specifically for
preprocessing code in HPC and compilation-centric tasks. Tokompiler leverages
knowledge of language primitives to generate language-oriented tokens,
providing a context-aware understanding of code structure while avoiding human
semantics attributed to code structures completely. We applied Tokompiler to
pre-train two state-of-the-art models, SPT-Code and Polycoder, for a Fortran
code corpus mined from GitHub. We evaluate the performance of these models
against the conventional LLMs. Results demonstrate that Tokompiler
significantly enhances code completion accuracy and semantic understanding
compared to traditional tokenizers in normalized-perplexity tests, down to ~1
perplexity score. This research opens avenues for further advancements in
domain-specific LLMs, catering to the unique demands of HPC and compilation
tasks
MPI-rical: Data-Driven MPI Distributed Parallelism Assistance with Transformers
Automatic source-to-source parallelization of serial code for shared and
distributed memory systems is a challenging task in high-performance computing.
While many attempts were made to translate serial code into parallel code for a
shared memory environment (usually using OpenMP), none has managed to do so for
a distributed memory environment. In this paper, we propose a novel approach,
called MPI-rical, for automated MPI code generation using a transformer-based
model trained on approximately 25,000 serial code snippets and their
corresponding parallelized MPI code out of more than 50,000 code snippets in
our corpus (MPICodeCorpus). To evaluate the performance of the model, we first
break down the serial code to MPI-based parallel code translation problem into
two sub-problems and develop two research objectives: code completion defined
as given a location in the source code, predict the MPI function for that
location, and code translation defined as predicting an MPI function as well as
its location in the source code. We evaluate MPI-rical on MPICodeCorpus dataset
and on real-world scientific code benchmarks and compare its performance
between the code completion and translation tasks. Our experimental results
show that while MPI-rical performs better on the code completion task than the
code translation task, the latter is better suited for real-world programming
assistance, in which the tool suggests the need for an MPI function regardless
of prior knowledge. Overall, our approach represents a significant step forward
in automating the parallelization of serial code for distributed memory
systems, which can save valuable time and resources for software developers and
researchers. The source code used in this work, as well as other relevant
sources, are available at:
https://github.com/Scientific-Computing-Lab-NRCN/MPI-rica
Split chloramphenicol acetyl-transferase assay reveals self-ubiquitylation-dependent regulation of UBE3B
Split reporter protein-based genetic section systems are widely used to identify and
characterize protein-protein interactions (PPI). Assembly of split markers that
antagonize toxins, rather than required for synthesis of missing essential metabolites,
facilitate the seeding of high density of cells and selective growth. Here we present adeveloped split chloramphenicol acetyltransferase (split-CAT) -based genetic selection
system. The N-terminus fragment of CAT is fused downstream of the protein of interest
and the C-terminus fragment is tethered upstream of a postulated protein partner. We
demonstrate the system's advantages for the study of PPIs. Moreover, we show that
co-expression of a functional ubiquitylation cascade where the target and ubiquitin are
tethered to the split-CAT fragments results in ubiquitylation-dependent growth on
selective media. The fact that proteins do not have to be purified from bacteria and the
high sensitivity of the split-CAT reporter, enable the detection of challenging protein
cascades and post-translation modifications. In addition, we demonstrate that the split-
CAT system responds to small molecule inhibitors and molecular glues (GLUTACs).
The absence of ubiquitylation-dependent degradation and deubiquitylation in E. coli
significantly simplify the interpretation of the results. We demonstrate that the spit-CAT
system provides a readout for the known self-ubiquitylation-dependent inactivation of
NEDD4. Subsequently, we harnessed the system to explore if UBE3B, a HECT ligase
not belonging to the Nedd4 subfamily, is also regulated by self-ubiquitylation. We found
that self-ubiquitylation of UBE3B at residue K665 inactivates the enzyme in the E. coli
system and in mammalian cells due to its oligomerization